42 research outputs found

    Kernal based speaker specific feature extraction and its applications in iTaukei cross language speaker recognition

    Get PDF
    Extraction and classification algorithms based on kernel nonlinear features are popular in the new direction of research in machine learning. This research paper considers their practical application in the iTaukei automatic speaker recognition system (ASR) for cross-language speech recognition. Second, nonlinear speaker-specific extraction methods such as kernel principal component analysis (KPCA), kernel independent component analysis (KICA), and kernel linear discriminant analysis (KLDA) are summarized. The conversion effects on subsequent classifications were tested in conjunction with Gaussian mixture modeling (GMM) learning algorithms; in most cases, computations were found to have a beneficial effect on classification performance. Additionally, the best results were achieved by the Kernel linear discriminant analysis (KLDA) algorithm. The performance of the ASR system is evaluated for clear speech to a wide range of speech quality using ATR Japanese C language corpus and self-recorded iTaukei corpus. The ASR efficiency of KLDA, KICA, and KLDA technique for 6 sec of ATR Japanese C language corpus 99.7%, 99.6%, and 99.1% and equal error rate (EER) are 1.95%, 2.31%, and 3.41% respectively. The EER improvement of the KLDA technique-based ASR system compared with KICA and KPCA is 4.25% and 8.51% respectively

    Forensic and Automatic Speaker Recognition System

    Get PDF
    Current Automatic Speaker Recognition (ASR) System has emerged as an important medium of confirmation of identity in many businesses, ecommerce applications, forensics and law enforcement as well. Specialists trained in criminological recognition can play out this undertaking far superior by looking at an arrangement of acoustic, prosodic, and semantic attributes which has been referred to as structured listening. An algorithmbased system has been developed in the recognition of forensic speakers by physics scientists and forensic linguists to reduce the probability of a contextual bias or pre-centric understanding of a reference model with the validity of an unknown audio sample and any suspicious individual. Many researchers are continuing to develop automatic algorithms in signal processing and machine learning so that improving performance can effectively introduce the speaker’s identity, where the automatic system performs equally with the human audience. In this paper, I examine the literature about the identification of speakers by machines and humans, emphasizing the key technical speaker pattern emerging for the automatic technology in the last decade. I focus on many aspects of automatic speaker recognition (ASR) systems, including speaker-specific features, speaker models, standard assessment data sets, and performance metric

    Bayesian distance metric learning and its application in automatic speaker recognition systems

    Get PDF
    This paper proposes state-of the-art Automatic Speaker Recognition System (ASR) based on Bayesian Distance Learning Metric as a feature extractor. In this modeling, I explored the constraints of the distance between modified and simplified i-vector pairs by the same speaker and different speakers. An approximation of the distance metric is used as a weighted covariance matrix from the higher eigenvectors of the covariance matrix, which is used to estimate the posterior distribution of the metric distance. Given a speaker tag, I select the data pair of the different speakers with the highest cosine score to form a set of speaker constraints. This collection captures the most discriminating variability between the speakers in the training data. This Bayesian distance learning approach achieves better performance than the most advanced methods. Furthermore, this method is insensitive to normalization compared to cosine scores. This method is very effective in the case of limited training data. The modified supervised i-vector based ASR system is evaluated on the NIST SRE 2008 database. The best performance of the combined cosine score EER 1.767% obtained using LDA200 + NCA200 + LDA200, and the best performance of Bayes_dml EER 1.775% obtained using LDA200 + NCA200 + LDA100. Bayesian_dml overcomes the combined norm of cosine scores and is the best result of the short2-short3 condition report for NIST SRE 2008 data

    High level speaker specific features modeling in automatic speaker recognition system

    Get PDF
    Spoken words convey several levels of information. At the primary level, the speech conveys words or spoken messages, but at the secondary level, the speech also reveals information about the speakers. This work is based on the high-level speaker-specific features on statistical speaker modeling techniques that express the characteristic sound of the human voice. Using Hidden Markov model (HMM), Gaussian mixture model (GMM), and Linear Discriminant Analysis (LDA) models build Automatic Speaker Recognition (ASR) system that are computational inexpensive can recognize speakers regardless of what is said. The performance of the ASR system is evaluated for clear speech to a wide range of speech quality using a standard TIMIT speech corpus. The ASR efficiency of HMM, GMM, and LDA based modeling technique are 98.8%, 99.1%, and 98.6% and Equal Error Rate (EER) is 4.5%, 4.4% and 4.55% respectively. The EER improvement of GMM modeling technique based ASR systemcompared with HMM and LDA is 4.25% and 8.51% respectively

    Environmental Energy Harvesting Techniques to Power Standalone IoT-Equipped Sensor and Its Application in 5G Communication

    Get PDF
    In the recent few years, due to its significant deployment to meet global demand for smart cities, the Internet of Things (IoT) has gained a lot of attention. Environment energy harvesting devices, which use ambient energy to generate electricity, could be a viable option in near future for charging or powering stand-alone IoT sensors and electronic devices. The key advantages of such energy harvesting gadgets are that they are environmentally friendly, portable, wireless, cost-effective, and compact. It is significant to propos and fabricate an improved, high-quality, economical, and efficient energy harvesting systems to overcome power supply to tiny IoT devices at the remote locations. In this article, various types of mechanisms for harvesting renewable energies that can power sensor enabled IoT locally, as well as its associated wireless sensor networks (WSNs), are reviewed. These methods are discussed in terms of their advantages and applications, as well as their drawbacks and limitations. Furthermore, methodological performance analysis for the decade 2005 to 2020 is surveyed in order to identify the methods that delivered high output power for each device. Furthermore, the outstanding breakthrough performances of each of the aforementioned micro-power generators during this time period are emphasized. According to the research, thermoelectric modules can convert up to 2500×10^(-3) W/cm^2, thermo-photovoltaic 10.9%, piezoelectric 10,000 mW/cm^3 and microbial fuel cell 6.86 W/m^2 of energy. Doi: 10.28991/esj-2021-SP1-08 Full Text: PD

    The role of speech technology in biometrics, forensics and man-machine interface

    Get PDF
    Day by day Optimism is growing that in the near future our society will witness the Man-Machine Interface (MMI) using voice technology. Computer manufacturers are building voice recognition sub-systems in their new product lines. Although, speech technology based MMI technique is widely used before, needs to gather and apply the deep knowledge of spoken language and performance during the electronic machine-based interaction. Biometric recognition refers to a system that is able to identify individuals based on their own behavior and biological characteristics. Fingerprint success in forensic science and law enforcement applications with growing concerns relating to border control, banking access fraud, machine access control and IT security, there has been great interest in the use of fingerprints and other biological symptoms for the automatic recognition. It is not surprising to see that the application of biometric systems is playing an important role in all areas of our society. Biometric applications include access to smartphone security, mobile payment, the international border, national citizen register and reserve facilities. The use of MMI by speech technology, which includes automated speech/speaker recognition and natural language processing, has the significant impact on all existing businesses based on personal computer applications. With the help of powerful and affordable microprocessors and artificial intelligence algorithms, the human being can talk to the machine to drive and control all computer-based applications. Today's applications show a small preview of a rich future for MMI based on voice technology, which will ultimately replace the keyboard and mouse with the microphone for easy access and make the machine more intelligent

    High Level Speaker Specific Features as an Efficiency Enhancing Parameters in Speaker Recognition System

    Get PDF
    In this paper, I present high-level speaker specific feature extraction considering intonation, linguistics rhythm, linguistics stress, prosodic features directly from speech signals. I assume that the rhythm is related to language units such as syllables and appears as changes in measurable parameters such as fundamental frequency (  ), duration, and energy. In this work, the syllable type features are selected as the basic unit for expressing the prosodic features. The approximate segmentation of continuous speech to syllable units is achieved by automatically locating the vowel starting point. The knowledge of high-level speaker’s specific speakers is used as a reference for extracting the prosodic features of the speech signal. High-level speaker-specific features extracted using this method may be useful in applications such as speaker recognition where explicit phoneme/syllable boundaries are not readily available. The efficiency of the particular characteristics of the specific features used for automatic speaker recognition was evaluated on TIMIT and HTIMIT corpora initially sampled in the TIMIT at 16 kHz to 8 kHz. In summary, the experiment, the basic discriminating system, and the HMM system are formed on TIMIT corpus with a set of 48 phonemes. Proposed ASR system shows 1.99%, 2.10%,  2.16%  and  2.19 % of efficiency improvements compared to traditional ASR system for and of 16KHz TIMIT utterances
    corecore